reservoir sampling について

Words near each other

・ "O" Is for Outlaw
・ "O"-Jung.Ban.Hap.
・ "Ode-to-Napoleon" hexachord
・ "Oh Yeah!" Live
・ "Our Contemporary" regional art exhibition (Leningrad, 1975)
・ "P" Is for Peril
・ "Pimpernel" Smith
・ "Polish death camp" controversy
・ "Pro knigi" ("About books")
・ "Prosopa" Greek Television Awards
・ "Pussy Cats" Starring the Walkmen
・ "Q" Is for Quarry
・ "R" Is for Ricochet
・ "R" The King (2016 film)
・ "Rags" Ragland
・ ! (album)
・ ! (disambiguation)
・ !!
・ !!!
・ !!! (album)
・ !!Destroy-Oh-Boy!!
・ !Action Pact!
・ !Arriba! La Pachanga
・ !Hero
・ !Hero (album)
・ !Kung language
・ !Oka Tokat
・ !PAUS3
・ !T.O.O.H.!
・ !Women Art Revolution

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

reservoir sampling ：ウィキペディア英語版

reservoir sampling

Reservoir sampling is a family of randomized algorithms for randomly choosing a sample of ''k'' items from a list ''S'' containing ''n'' items, where ''n'' is either a very large or unknown number. Typically ''n'' is large enough that the list doesn't fit into main memory.
== Algorithm R ==
The most common example was labelled ''Algorithm R'' by Jeffrey Vitter in his paper on the subject. This simple O(''n'') algorithm as described in the ''Dictionary of Algorithms and Data Structures''〔( Dictionary of Algorithms and Data Structures )〕 consists of the following steps (assuming that the arrays are one-based, and that the number of items to select, ''k'', is smaller than the size of the source array, ''S''):

/
*
S has items to sample, R will contain the result
*/
ReservoirSample(S(), R())
// fill the reservoir array
for i = 1 to k
R() := S()
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R() := S()

The algorithm creates a "reservoir" array of size ''k'' and populates it with the first ''k'' items of ''S''. It then iterates through the remaining elements of ''S'' until ''S'' is exhausted. At the ''i''^th element of ''S'', the algorithm generates a random number ''j'' between 1 and ''i''. If ''j'' is less than or equal to ''k'', the ''j''^th element of the reservoir array is replaced with the ''i''^th element of ''S''. In effect, for all ''i'', the ''i''^th element of ''S'' is chosen to be included in the reservoir with probability ''k/i''. Similarly, at each iteration the ''j''^th element of the reservoir array is chosen to be replaced with probability ''1/k
* k/i'', which simplifies to ''1/i''. It can be shown that when the algorithm has finished executing, each item in ''S'' has equal probability (i.e. ''k/length(S)'') of being chosen for the reservoir.
To see this, consider the following proof by induction. After the (i-1)^th round, let us assume, the probability of a number being in the reservoir array is ''k/(i-1)''. Since the probability of the number being replaced in the i^thround is ''1/i'', the probability that it survives the i^th round is ''(i-1)/i''. Thus, the probability that a given number is in the reservoir after the i^th round is the product of these two probabilities, i.e. the probability of being in the reservoir after the (i-1)^th round, and surviving replacement in the i^th round. This is ''(k/(i-1))
* ((i-1)/i)=k/i''. Hence, the result holds for ''i'', and is therefore true by induction.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「reservoir sampling」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース